Skip to content

Conversation

@bernstei
Copy link
Collaborator

In particular, fixes issue where distributed mace_run_train processes overwrite each others' pretrained and combined data files. It's possible that there are other places where this kind of error exists, but I tried to fix it everyplace I could find it.

closes #1309

Noam Bernstein added 4 commits December 18, 2025 10:46
In particular, fixes issue where distributed mace_run_train processes
overwrite each others' pretrained and combined data files
…as size 0, to avoid unused parameters error
…ce that other tasks see the compeleted write
@bernstei
Copy link
Collaborator Author

includes #1311 , sort of accidentally

@bernstei bernstei changed the title Write files from root process only when distributed Write files only from root process when distributed Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mace_run_train distributed processes overwrite each others' pretrained data files

2 participants